All Categories :
Servers
Chapter 29
Taking Advantage of Perl
CONTENTS
Perl is an interpreted language designed for scanning arbitrary
text files, extracting information from those text files, and
printing reports based on that information. It is also a good
language for many Web site system management tasks. This chapter
shows you how to use it for a few handy CGI scripts, including
Web site statistics. Web site statistics are perhaps more relevant
on an Internet Web site than on an Intranet-but whether or not
you decide to put the information in Chapter 28
to use and connect to the Internet, you will still be faced with
a need to automate countless numbers of server chores. If you
take the time to learn Perl, you will find that it is an ideal
employee for those tasks. And if you don't have time to learn
Perl, you still win, because many Perl scripts which you can use
as-is are available on the Internet.
For non-GUI systems programming, Perl fills the gap between low-level
programming languages, such as C, and high-level languages such
as AWK, SED, and the UNIX Shell. Although C is a very powerful
language, it requires a steep learning curve to master. Perl does
not offer the speed of a compiled language (like C), but it does
offer very good string-handling capabilities and it is easier
to learn. Most people with experience in any of the languages
just mentioned will find Perl to be an easy migration.
Note |
Here's some good news for those who like Perl but hate to give up performance. Hip Communications Inc., in Vancouver, BC, has recently announced PerlIS.dll for Microsoft IIS. PerlIS is an ISAPI version of the Perl interpreter, so the Web server is able to process Perl scripts at a greatly increased speed compared to traditional CGI. You can download the software and documentation for PerlIS and Perl for Win32 at this URL:
http://www.perl.hip.com/
Hip Communications maintains current versions of Perl 5.001m for Windows NT and Windows 95 using Visual C++ 4.x. PerlIS should have no problems running on the Purveyor 1.2 Web server from Process Software, and probably other ISAPI-compliant HTTP servers for NT as well.
|
WWWusage, which is available for free downlead at http://rick.wzl.rwth-aachen.de/RickG/WWWusage/wwwusage.html,
is an excellent example of a powerful CGI script written in Perl.
WWWusage was written by Richard Graessler to perform HTTP statistics
on Windows NT. Christopher Brown, with whom I co-authored Web
Site Construction Kit for Windows NT and Web Site Construction
Kit for Windows 95, gets credit for the original draft of
this chapter. Chris ported WWWusage to Windows 95, although Rick's
latest version of the program for Windows NT is discussed here.
Note |
Although Perl is an acronym for Practical Extraction and Report Language, like the names of many other program languages formed from acronyms, it isn't always written in uppercase letters.
|
Perl comes in two versions: Perl 4 and Perl 5. Version 5 is the
new kid on the block, and it comes with object-oriented extensions.
Perl was invented on UNIX by Larry Wall. Keep in mind that Perl
has always had strong roots on UNIX platforms, and most of the
Webmasters who use it and post public-domain Perl source code
work on UNIX. Fortunately, some nice folks have ported Perl to
Win32 and made it available by anonymous FTP.
Dick Hardt of Hip Communications led the porting of Perl 5 to
Windows NT/95. You can obtain the latest version of it from ftp://ntperl.hip.com/ntperl/.
This site contains the Visual C++ source code for Perl, binary
files, and documentation.
Tip |
If you plan to install Perl 5 on Windows 95, the batch files that come with Perl will not work. You will have to make three registry entries manually, as described in a file named win95.txt (accompanying Perl 5) that explains the process.
|
You can retrieve Perl 4 for Windows NT at the FTP site of Intergraph.
Point your Web browser or CuteFTP (see Chapter 7,
"Running the Intranet Web Server") to this URL: ftp://ftp.intergraph.com/pub/win32/perl/.
You may want to download the file ntperlb.zip
(if you only want the compiled version) or ntperls.zip
(if you want the Visual C++ 2.0 source code). Using Perl 4 on
Windows 95 requires a patch (which you can find at Yahoo) developed
by Bob Denny.
This book is already covering a great deal of information, and
I don't intend to deluge you with a complete course on the Perl
language. What this chapter does do is give you a quick introduction
to the Perl syntax and show you how to put some Perl scripts to
work so you can jump right in. For more information about Perl,
please see Teach Yourself Perl 5 in 21 Days, 2nd Edition
by David Till, published by Sams.
Note |
Much of the material in this section was generously contributed originally by Gary E. Major (Systems Analyst) at Seattle Pacific University. You can visit his site on the Web for the latest information: http://www.spu.edu/tech/basic-perl/
|
There's so much to cover and only one chapter to do it. I caution
you that this material is not intended for those who are new to
programming. In fact, I am going to take somewhat of a hit-and-run
approach and present the material largely in reference format.
The second half of the chapter contains a very useful sample application.
Symbols
Table 29.1 presents the most common symbols unique to Perl and
their meaning.
Table 29.1. Common Perl symbols.
Symbol | Purpose
|
$
| For scalar values. |
@
| For indexed arrays. |
%
| For hashed arrays (associative arrays). |
*
| For all types of that symbol name. These are sometimes used like pointers in perl4, but perl5 uses references.
|
<>
| Used for inputting a record from a filehandle.
|
\
| Takes a reference to something. |
Script Components
This section lists the basic components of a Perl script. The
first line of every Perl program is a required special comment
to identify the file location of the Perl interpreter itself.
For example:
#!/usr/local/bin/perl
- Comment lines begin with the #
character.
- Commands end with the ;
character.
- Subroutines are contained in braces, for example, sub
subroutine_name { }.
This list shows the predefined data types:
- Scalars
Names begin with the $ character
(for example, $scalarname).
Used for characters, strings, and numbers.
- Arrays
Names begin with the @ character
(for example, @arrayname).
Individual items referenced: @arrayname[n].
Used for lists of characters, strings, and numbers.
Note that the list can have a mixture of characters, strings,
and numbers.
- Hashes (also known as associative arrays)
Names begin with the % character
(for example, %hashname)
Individual items referenced: %hashname{'key'}.
Handles key/value combinations:
"Gary Major", "Systems Analyst"
"Phil Rand", "Senior Systems Analyst"
"E. Arthur Self", "Who?"
Special Variables and Characters
Table 29.2 lists several predefined variables and reserved characters
in Perl.
Table 29.2. Predefined Perl variables.
Variable | Purpose
|
$0 |
Contains the name of the script being executed.
|
$_ |
Default input and pattern search variable. |
$/ |
Input record separator, newline by default. |
@ARGV |
Contains command-line arguments. $ARGV[0] is the first argument.
|
@INC |
Contains the list of places to look for scripts to be evaluated by the do or require commands.
|
%INC |
Contains entries for each file included by the do or require commands.
|
%ENV |
Contains your environment settings. Changes made affect child processes.
|
STDIN |
Default input stream. |
STDOUT |
Default output stream. |
STDERR |
Default error stream. |
Arithmetic Operators
Table 29.3 lists the common mathematical operators.
Table 29.3. Perl mathematical operators.
Operator | Example
| Meaning |
+
| $a + $b
| Sum of $a and $b
|
-
| $a - $b
| Difference of $a and $b
|
*
| $a * $b
| Product of $a times $b
|
/
| $a / $b
| Quotient of $a divided by $b
|
%
| $a % $b
| Remainder of $a divided by $b
|
**
| $a ** $b
| $a to the power of $b
|
Assignment Operators
Perl supports a rich array of assignment operators for many purposes.
If the list in Table 29.4 seems overwhelming, try to stick to
the easy ones and learn about the others after you have more experience
with Perl programming.
Table 29.4. Perl assignment operators.
Operator | Example
| Meaning |
=
| $var = 5
| Assign 5 to $var
|
++
| $var++ or ++$var
| Increment $var by 1 and assign to $var
|
--
| $var-- or --$var
| Decrement $var by 1 and assign to $var
|
+=
| $var += 3
| Increase $var by 3 and assign to $var
|
-=
| $var -= 2
| Decrease $var by 2 and assign to $var
|
.=
| $str .= "ing"
| Concatenate "ing" to $str and assign to $str
|
*=
| $var *= 4
| Multiply $var by 4 and assign to $var
|
/=
| $var /= 2
| Divide $var by 2 and assign to $var
|
**=
| $var **= 2
| Raise $var to the second power and assign to $var
|
%=
| $var %= 2
| Divide $var by 2 and assign remainder to $var
|
x=
| $str x= 20
| Repeat $str 20 times and assign to $str
|
Logical Operators
The logical operators in Perl (shown in Table 29.5) are useful
in If statements typical
of nearly all programming languages.
Table 29.5. Perl logical operators.
Operator | Example
| Meaning |
&&
| $a && $b
| True if $a is true and $b is true
|
||
| $a || $b
| True if $a is true or if $b is true
|
!
| ! $a |
True if $a is not true
|
Pattern-Matching Operators
Pattern matching is one of the areas in which Perl shows its strength.
These operators, shown in Table 29.6, are very useful for string
operations.
Table 29.6. Perl pattern-matching operators.
Operator | Example
| Meaning |
=~ // |
$a =~ /pat/
| True if $a contains pattern pat
|
=~ s// |
$a =~ s/p/r
| Replace occurrences of p with r in $a
|
=~ tr//
| $a =~ tr/a-z/A-Z
| Translate to corresponding characters |
!~ // |
$a !~ /pat/
| True if $a does not contain pattern pat
|
String Operators
String operators in Perl, as shown in Table 29.7, are the mainstay
of the language.
Table 29.7. Perl string operators.
Operator | Example
| Meaning |
. |
$a . $b |
Concatenate $b to the end of $a
|
x |
$a x $b |
Value of $a strung together $b times
|
substr()
| substr($a, $o, $l)
| Substring of $a at offset $o of length $l
|
index()
| index($a, $b)
| Offset of string $b in string $a
|
Relational Operators
The relational operators shown in Table 29.8 are essential to
If and While
statements.
Table 29.8. Perl relational operators.
Numeric Operator |
String Operator | Example
| Meaning |
==
| eq
| $str eq "Word"
| Equal to |
!=
| ne
| $str ne "Word"
| Not equal to |
>
| gt
| $var > 10
| Greater than |
>=
| ge
| $var >= 10
| Greater than or equal to |
<
| lt
| $var < 10
| Less than |
<=
| le
| $var <= 10
| Less than or equal to |
Basic Perl Commands
Here are several predefined Perl commands that you will come across
repeatedly.
- print FILEHANDLE LIST
Prints a string or comma-delimited list of strings to FILEHANDLE
(STDOUT is the default).
Example: print FILE
"This is line 1\n", "This is line 2\n";
- printf FILEHANDLE FORMAT, LIST
Prints a formatted string to FILEHANDLE (STDOUT is the default).
Example: printf FILE "%s %d\n",
"This is line", 3;
printf formatting specifiers
are as follows:
Conversion Character
| Definition |
%s
| String |
%c
| Character |
%d
| Decimal number |
%ld
| Long decimal number |
%u
| Unsigned decimal number |
%ul
| Unsigned long decimal number |
%x
| Hexadecimal number |
%lx
| Long hexadecimal number |
%o
| Octal number |
%lo
| Long octal number |
%e
| Floating-point number in scientific notation
|
%f
| Floating-point number |
- open(FILEHANDLE, EXPR)
Opens a real file, EXPR,
and attaches it to FILEHANDLE.
Without EXPR, a scalar with
the same name as FILEHANDLE
would have been assigned the filename.
Example:
open(FILE, "this-is-a-long-filename.txt");
EXPR definitions appear in the following table:
Open
as: EXPR:
Read Only "filename"
Write
Only ">filename"
Read
and Write "+>filename"
Append ">>filename"
Pipe
In "unix
command |"
Pipe
Out "|
unix command"
- close(FILEHANDLE)
Closes the file, socket, or pipe associated with FILEHANDLE.
Example: close(FILE);
- chop(LIST|VARIABLE)
Chops off the last character of a string, VARIABLE,
or the last character of each item in LIST
and returns the chopped value.
Examples:
chop($scalar);
chop(@array);
- 'unix command' (these
quotes are actually backtick characters)
Executes the text within the backticks as if it were typed at
the system command prompt. Example:
$date = 'date';
- &SUBROUTINE(LIST)
Executes a subroutine, which returns the value of
the last expression evaluated. Subroutines can accept arguments
through LIST. Subroutines
are global and can be defined anywhere in the script, even in
another file (Perl module). Example:
&printname($first, $last);
Flow of Control Statements
- if .. elsif .. else
Examples:
if (Expression) { Block }
if (Expression) { Block } else { Block }
if (Expression) { Block } elsif (Expression) { Block } else {
Block }
- unless construct
Examples:
unless (Expression) { Block }
unless (Expression) { Block } else { Block }
unless (Expression) { Block } elsif (Expression) { Block } else
{ Block }
- while loop
Example:
while (Expression) { Block }
- until loop
Example:
until (Expression) { Block }
- do ... until loop
Example:
do { Block } until (Expression)
- do ... while loop
Example:
do { Block } while (Expression)
- for loop
Example:
for (Expression1; Expression2; Expression3) { Block }
Note |
Expression1 is used to set the initial value of the loop variables. Expression2 is used to test whether the loop should continue or stop. Expression3 is used to update the loop variables.
|
- foreach loop
Example:
foreach VARIABLE (ARRAY) { Block }
Note |
VARIABLE is local to the foreach loop and regains its former value when the loop terminates. If VARIABLE is missing, the special scalar $ is used.
|
Perl Modules
A Perl module is a set of functions grouped into a package that
deal with a similar problem. You can use module functions in a
Perl script by telling your script the name of the module with
the use command. For example,
use CGI;.
One example of a Perl module is the CGI.pm
module. This file includes functions that provide an easy interface
to CGI programming, enabling you to write HTML forms and easily
deal with the results. For more information about CGI.pm,
visit its home page at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html.
This page has information about the functions available and examples
of how they are used.
Executing the Script
To run a Perl program, you can type the script name at the command
prompt. Here are several example commands that you can use for
debugging scripts:
- To check syntax:
perl -c scriptname
- To generate warnings:
perl -w scriptname
- To check syntax and generate warnings:
perl -cw scriptname
- To run the debugger:
perl -d scriptname
Two of the most common uses of Perl by Webmasters are statistical
analysis and forms processing. This section and the next present
two Perl CGI scripts that prove very useful for these purposes.
As a Webmaster, you want to know who's coming to your site, how
often, and what they are doing there. To accomplish this, the
examples use the Perl programming language interpreter and the
WWWusage CGI application.
Actually, before getting into Perl, let's mention a very interesting
tool that can help you chart your Web site statistics without
requiring any custom programming. It will analyze your Web page
usage based on your server log files. A company called Logical
Design Solutions has invented a cool program called WebTrac. You
can download the free program and give it a try. Visit their home
page at http://www.lds.com.
WWWusage is a Perl script written by Richard Graessler (rickg@pobox.com)
to analyze and calculate monthly usage statistics from log files
generated by World Wide Web servers. This application is designed
for use with Windows NT. Once the script is customized for your
Web server (which is easy to do), WWWusage should work on any
Windows NT system with NT Perl 5.001 installed.
WWWusage will generate a new statistics page each month and the
output of WWWusage is easy to read. For more information about
WWWusage and to download a free copy, please visit Rick's Web
page. It also contains many other interesting resources and Perl
scripts for Windows NT:
http://rick.wzl.rwth-aachen.de/rick/
Tip |
Remember that some Web server statistics tools require the Web server to close the log files before the files can be analyzed. This is true of IIS 2.0.
|
WWWusage will process HTTP access log files in the Common Logfile
Format and output monthly statistics in HTML format ready for
publishing on the Web. It creates reports on any or all of the
following:
- Transfers by HTTP method (total)
- Transfers for each status code (total)
- Daily transmission statistics (total)
- Weekday transmission statistics (total)
- Hourly transmission statistics (total)
- Transfers by client domain-top level (top xx and total)
- Transfers by client domain-second level (top xx and total)
- Transfers by client subdomain (top xx and total)
- Transfers by client host (top xx and total)
- Transfers by file type (top xx and total)
- Transfers by file name (top xx and total)
- Total transfers to each remote identifier (total)
WWWusage does not make any changes to the access log files or
write any files in the server directories (with the exception
of two output HTML files per month).
Log File Formats
Gone are the days when every Web server used its own proprietary
log file format (but for a few notable exceptions). Numerous formats
made it very difficult to write general statistics collectors.
Therefore, the Web community designed the Common Logfile Format,
which will soon become the default, if it hasn't already.
The Common Logfile Format
Here is the format of each line in the logfile, followed by an
explanation of each field:
remotehost rfc931 authuser [date] "request"
status bytes
- remotehost-Remote host
name (or IP number if DNS host name is not available, or if DNS
Lookup of HTTPS server is disabled).
- rfc931-The remote logname
of the user.
- Authuser-The user name
by which the user has authenticated himself, or "-"
if not available.
- [date]-Date and time
of the request with time zone offset from GMT at end:
[DD/Mon/YYYY:hh:mm:ss [+/-]HHMM].
- "request"-The
request line exactly as it came from the client, using the format:
"method file httpversion."
- Method-Method can be GET,
HEAD, POST,
or none.
- File-File contains the full file path
and arguments of the requested file. The file path is either relative
to the disk root or relative to the HTTPS document root.
- Httpversion-Httpversion specifies the
version number of HTTP specification, for example, "HTTP/1.0"
for the current spec.
- status-The HTTP status
code returned to the client (three digits) or "-"
if not available.
- bytes-The content-length
of the document transferred in bytes or "-"
if not available.
Note |
Microsoft IIS does not write HTTP logfiles using the Common Logfile Format. However, IIS does include a simple command-line utility which can convert from the IIS format to the Common Logfile Format. You can then use WWWusage to process the results.
|
Some Web servers use a single log file. Some servers write logfiles
which can be closed automatically (sometimes called cycled),
others must be closed manually. Others have a single log file
for each day, so that there is no need to cycle the log file.
Other Perl Statistic Scripts
Before we get to WWWusage, let's take a quick look at some other
great Perl analysis scripts for HTTP log files. Just check Appendix
C, "Resources for the Windows NT Webmaster," or search
Yahoo for CGI or Perl.
- Roy Fielding's wwwstat.
Works with Common Logfile Format under NT Perl, if some minor
modifications are done to remove some UNIX NCSA HTTPD specifics.
- Steven Nemetz's iisstat.
Works with both the Common and EMWAC Logfile Format under NT Perl.
- Nick Phillips's musage.
Written for NT HTTPS and Perl. It works great with the EMWAC Logfile
Format, but there are still some minor bugs with the Common Logfile
Format.
Configuring WWWusage
Listing 29.1 shows the configuration section from the top of the
file wwwusage.pl. All you
need to do is read the comments in the source code to determine
the modifications you need to make to customize the program for
your site.
Listing 29.1. This excerpt of WWWusage shows the lines that
can be modified for your Web site.
#!/cgi32/perl
#
# WWWusage - Perl script to calculate monthly usage statistics
# from log files
# generated by the Windows NT World Wide Web servers (https).
#
# Copyright (c) 1995 Richard Graessler (rickg@pobox.com)
#
# For the latest version, DOCUMENTATION and LICENSE AGREEMENT see
# <URL: http://pobox.com/~rickg/rickg/wwwusage/wwwusage.html>
#
# This program is provided "AS IS", WITHOUT ANY WARRANTY
# (see License Agreement)
#
# Bug reports, comments, questions and suggestions are welcome.
# Please mail to
# rickg@pobox.com with the "subject: WWWusage" but please
# check first that you have the latest version.
#
# CREDITS:
#
# There are some other Perl logfile analyse scripts on the net:
# Roy Fielding's wwwstat
# <URL: http://www.ics.uci.edu/WebSoft/wwwstat/>
# Nick Phillips's musage
# <URL: http://www.blpes.lse.ac.uk/misc/musage.htm>
# Steven Nemetz's iisstat
# <URL:
# ftp://ftp.ccmail.com/pub/utils/InternetServices/iisstat/iisstat.html>
# Looking into these scripts helped me to write this script and
# there might be still some parts based on them.
#
# Requires timelocal.pl and getopts.pl which are included in the Perl
# disribution package.
#
# Thanks to the authors!
#
######################################################################
# Program internal variables (please do not change!)
######################################################################
$VERNAME = 'WWWusage'; # Program name
$VERSION = '0.99'; # Program version
$VERDATE = '26 December 1995'; # Program version date
######################################################################
# Present setting
######################################################################
# In Perl for Windows NT you can use forward slash (/) or double
# backslash (\\) in pathnames (e.g. C:/LOGS/ or c:\\LOGS\\). File
# and path names could
# be absolute(e.g. C:/LOGS/) or relative to current directory
# (e.g. ./LOGS/).
# hostname of www server (HTTPS)
$ServerName = 'rick.wzl.rwth-aachen.de';
# flag - specifies the logfile format
# 1 : common log file format,
# 0 : EMWAC HTTPS
$LogFormat = 1;
# file containing the country-codes to allow expansion from domain
# to country name
$CountryCodeFile = 'C:/www/alibaba/admin/country-codes.txt';
# Pattern used to recognise log files translated into a Perl regular
# expression, e.g. ('.+\.log' for *.log), ('ac.+\.log' for ac*.log).
# If your HTTPS have only one logfile simply set "access.log"
# Note: If you have more than one logfile the script assumes that the
# alphabetical order of the filenames is the same as the
# chronological order
$LogFilePattern = '.+\.log';
# directory containing external configuration files
# (without ending slash!)
$ConfigFileDir = 'c:/www/alibaba/admin';
# directory containing log files (without ending slash!)
$LogFileDir = 'c:/www/alibaba/logs/HTTP';
# filename (incl. path and arguments if necessary) of shell for
# unpacking archives. Note: If you use this feature please note
# that the archive contains only the logfiles for a single month
# and that you didn't analyse archives and normal logfile at
# the same time.
$Gzip = 'gunzip -c'; # Gzip Format: *.gz, *.Z
$Zip = 'unzip -p'; # Zip Format: *.zip
$Tar = 'tar -x -O -f'; # Tar Format: *.tar
# WWWusage directory to write statistics reports
# (without ending slash!)
$OutPutDir = 'c:/www/alibaba/htmldocs/usage/wwwusage';
# WWWusage Error file name including path
$ErrorFile = 'c:/www/alibaba/admin/WWWusage.log';
# Filename without extension for HTML main output file
# (e.g. "WWWusage", "index" or "default")
$MenuFile = "index";
# Extension for HTML output files
$HTMLextension = "html";
# show top nn statistics in main output, the detail output
# contains all (e.g. 30)
$Top = 30;
# format of the output HTML page (0 = <PRE></PRE>, 1 <TABLE></TABLE>
$HTMLOutput = 0;
# flags - disable if you don't want that output
$DoDomain = 1; # Transfers by Client Domain (top level)
$DoDomain2 = 1; # Transfers by Client Domain (second level)
$DoSubdomain = 1; # Transfers by Client Subdomain
$DoHost = 1; # Transfers by Client Host
$DoFileType = 1; # Transfers by File Type
$DoFileName = 1; # Transfers by File Name (URL)
$DoHTTPSMethod = 1; # Transmission Statistics HTTPS Method
$DoStatusCode = 1; # Transmission Statistics Status Code
$DoDaily = 1; # Transmission Statistics Day
$DoWeekdaily = 1; # Transmission Statistics Weekday
$DoHourly = 1; # Transmission Statistics Hour
$DoIdent = 2; # Transfers by Remote Identifer
# NOTE for $DoIdent: For security reasons, you should not
# publish to the web any report that lists the Remote Identifiers
# (rfc931 or authuser):
# 0 : no display, 1 : real user name, 2 : cookie name
# flag - disable if you don't want to create the detail statistics
# to save time
$DoDetail = 1;
# flag - disable if you don't want to create links to your
# accessed pages
$FileNameHREF=0;
# user specific parameters for the TABLE tag
$HTMLTable = 'Border=2 CELLPADDING=8 CELLSPACING=5';
# user specific backgrounds for all returns. Here you can set
# all elements of the body tag which can appear between "<BODY ... >
# in HTML format.
$HTMLBackground =
'BACKGROUND="/gif/bg0.gif" BGCOLOR="#63637b" TEXT="#ffffff" '
.'LINK="#00ffff" ALINK="#ff0000" VLINK="#ffff00" HRCOLOR="#ff0000" ';
# user specific header for all returned HTML pages in HTML format
@HTMLHeader = (
'<P><CENTER><A HREF="/image/ntrick.map"><IMG BORDER=0 HSPACE=10 ',
'ALIGN=MIDDLE SRC="/gif/ntrick.gif" '
'ALT="Rick\'s Windows NT Info Center"',
' ISMAP WIDTH=550 HEIGHT=44></A></CENTER></P>',
'<H1><CENTER>World Wide Web Server Usage Statistic</CENTER></H1><HR>'
);
# user specific footer for all returned HTML pages in HTML formats
@HTMLAddress = (
'<HR><HR><A NAME="Bottom"></A><A HREF="/image/address.map" >',
'<IMG BORDER=0 HSPACE=10 ALIGN=MIDDLE SRC="/gif/address.gif" ',
'ALT="Addressbar" ISMAP WIDTH=293 HEIGHT=31></A>'
);
# flag - disable if you don't want a detailed output on the console
$VerboseMode = 1;
# flag - disable if you don't want to see the skiped lines of
# the logfiles on the console
$ShowSkippedLines = 1;
# flag - disable if you don't want to show unresolved addresses
$ShowUnresolved = 1;
# file containing DNS names
# (will be created and updated by the script)
$DnsNamesFile = 'c:/www/alibaba/admin/dns-names.txt';
# flag - to set the DNS lookup. Note: DNS lookup needs much time
# and slow up the execution of WWWusage
# 0 : disable if you don't want to look up dnsname if ip address
# is given
# 1 : if you don't want to look up new dnsname but used the
# saved dnsnames
# 2 : if you want to look up new and old unresloved dnsname
# 3 : if you want only to look up new dnsname
$LookupDnsNames = 3;
# flag - disable if you don't want to sort the host list to save time
$SortHostList = 0;
# flag - disable if you don't want to encode filenames
$UrlEncode = 0;
# flag - disable if you don't want to detect on disk if filename is a
# directoryor file. If flag is set, you should run the script on
# your HTTPS machine
$FileCheck = 0;
# flag - enable it if you https automatically add a "/" to
# slashless dirs (1 for EMWAC HTTPS, Netscape - 0 for Alibaba)
$DirWorksWithSlash = 0;
# real directory name of document root of the www server
# (without ending Slash!)
$DocumentRoot = 'c:/www/alibaba/htmldocs';
# list of configured "default/index" filename(s) for your HTTPS
@DefaultHTML = ('index.html','index.sht','default.htm');
# flag - enable to convert all filenames (URLs) to lower case
$FileNamesToLowerCase = 1;
# time zone information. Only necessary for EMWAC log file format.
# if not set it will be computed. Format: "+0100" or "-1100"
# $TimeZone = "+0100";
# exclude filter: optional list of IP addresses to ignore, please
# include ipnummer as well as dns name(s) in the list! IPnumber will
# be checked forward, DNSnames will be checked backward. Perl
# expressions are possible.
# (e.g. "137.226" for "137.226.*.*", "rwth-aachen.de" for
# "*.wzl.rwth-aachen.de")
# @IgnoreHost = ('137.226.92.10','rick.wzl.rwth-aachen.de');
# include filter: optional list of IP addresses to focus on, please
# include ipnummer as well as dns name(s) in the list! IPnumber will
# be checked forward, DNSnames will be checked backward. Perl
# expressions are possible.
# (e.g. "137.226" for "137.226.*.*", "wzl.rwth-aachen.de" for
# "*.wzl.rwth-aachen.de")
# @FocusOnHost = ('137.226.','wzl.rwth-aachen.de');
# exclude filter: optional list of paths/files to ignore. Paths
# will be checked forward from the beginning of the url filename.
# Perl expressions are possible.
# @IgnorePath = ('/gif/','/images/');
# include filter: optional list of paths/files to focus on. Paths
# will be checked forward from the beginning of the url filename.
# Perl expressions are possible.
# @FocusOnPath = ('/rick/');
# exclude filter: optional list of file extensions to ignore.
# Extension will be checked backward from the beginning of the
# url filename. Perl expressions are possible.
@IgnoreExt = ('gif','jpeg','jpg');
# include filter: optional list of file extensions to focus on.
# Extension will be checked backward from the beginning of the
# url filename. Perl expressions are possible.
# @FocusOnExt = ('.htm','html');
# Alias list for virtual paths.
# Format: '/aliasname/' => 'drive:/path/'
# Key: alias or pathnames relative to HTTPS document root.
# Value: pathnames relative to disk root.
# Do not include the $DocumentRoot (with its value '/'). This array
# does not make sense with EMWAC HTTPS because it doesn't
# support alias.
%WWWAlias = (
'/ALIBABA/', 'C:/WWW/ALIBABA/DOCS/',
'/ALIPROXY/', 'C:/WWW/ALIBABA/HTML/',
'/ICONS/', 'C:/WWW/ALIBABA/ICONS/',
'/IMAGE/', 'C:/WWW/ALIBABA/CONF/',
'/COUNTER/', 'C:/WWW/ALIBABA/COUNTER/',
'/CFDOCS/', 'C:/WWW/ALIBABA/CFUSION/CFDOCS/',
'/PERFORM/', 'C:/WWW/ALIBABA/PERFORM/DOCS/',
'/RICK/PERFORM/', 'C:/WWW/ALIBABA/PERFORM/OUTPUT/',
'/CGI-BIN/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGIDOS/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGI-32/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGI32/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/CGI-SHL/', 'C:/WWW/ALIBABA/CGI-BIN/',
'/WINCGI/', 'C:/WWW/ALIBABA/WINCGI/',
'/WINBIN/', 'C:/WWW/ALIBABA/WINCGI/',
'/DLLALIAS/', 'C:/WWW/ALIBABA/CGIDLL/',
'/ALIPROXY/', 'C:/WWW/ALIBABA/HTML/',
);
# List of used file types and its extensions. The extensions must
# be written in regular Perl expression. If @FileTypesSort is
# given it determine the search order.
%FileTypes = (
'CGI Scripts', '(\/cgi32\/|\/cgi-32\/|\/cgi-shl\/)',
'DOS CGI Scripts', '(\/cgi-bin\/|\/cgidos\/)',
'WinCGI Scripts', '(\/wincgi\/|\/winbin\/)',
'DllCGI Scripts', '\/dllalias\/',
'Images', '\.(bmp|gif|xbm|jpg|jpeg)$',
'Movies', '\.(mpg|mov|scm)$',
'Archive Files', '\.(gz|z|zip|tar)$',
'HTML Files', '\.(htm|html)$',
'Imagemaps', '($\/image\/|\.map$)',
'Server Side Includes', '\.(sht|shtm|shtml)$',
'Text Files', '\.txt$',
'Binary Executables', '\.(com|exe)$',
'Script Executables', '\.(pl|sh|cmd|bat)$',
'Readme Files', '\/README.*$',
'Directory Listings', '\/$',
'Java Applets', '\.CLASS$',
);
@FileTypesSort = (
'HTML files',
'Images',
'CGI Scripts',
'Server side includes',
'Java Applets',
'Text files',
'Directory listings',
'DOS CGI Scripts',
'WinCGI Scripts',
'DllCGI Scripts',
'Movies',
'Archive files',
'Imagemaps',
'Binary Executables',
'Script Executables',
'Readme files',
);
# Response Codes taken from <draft-ieft-http-v10-spec-01.ps>,
# August 3,1995 Normally you don't need to change!
%StatusCode = (
'200', '200 OK',
'201', '201 Created',
'202', '202 Accepted',
'203', '203 Non-Authoritative Information',
'204', '204 No Content',
'300', '300 Multiple Choices',
'301', '301 Moved Permanently',
'302', '302 Moved Temporarily',
'303', '303 See Other',
'304', '304 Not Modified',
'400', '400 Bad Request',
'401', '401 Unauthorized',
'402', '402 Payment Required',
'403', '403 Forbidden',
'404', '404 Not found',
'405', '405 Method Not Allowed',
'406', '406 None Acceptable',
'407', '407 Proxy Authorization Required',
'408', '408 Request Timeout',
'409', '409 Conflict',
'410', '410 Gone',
'411', '411 Authorization Refused',
'500', '500 Internal Server Errors',
'501', '501 Not implemented',
'502', '502 Bad Gateway',
'503', '503 Service Unavailable',
'504', '504 Gateway Timeout',
);
##############################################################################
# END CONFIG
##############################################################################
The WWW MailTo & CommentTo gateway is a Windows NT HTTP CGI
Perl script. (Whew!) It enables you to send a message by SMTP
and/or to log the message to a local file. You can check Rick's
Web site for the latest and greatest (along with other resources
and documentation) at this URL:
http://rick.wzl.rwth-aachen.de/rick/
Using the HTTP GET method,
the script creates a predefined or user-supplied fill-out form
with a self-reference by the action tag. After the form is submitted,
the script will be executed a second time by the POST
method to create the mail and send it by SMTP if mail is enabled,
or save it in the comment file if comment is enabled.
The features depend on the configuration. The script can do any
of the following:
- Send mail by SMTP
- Save mail in a comment file
- Load a follow-up URL to the client browser automatically when
mailing is done
- Log all messages in a mail logfile
- Log all errors in an error logfile
- Log and append environment variable settings to mail or comment
file
- Test the correct execution of mail-sending operations
- Notify the user if mail sending fails
- Create user-specific form files that are Perl scripts
- Use predefined and user-defined form fields
Installation
You need to put mailto.pl
into your scripts or cgi-bin
directory. Some HTTP servers use a different CGI directory for
DOS CGI, Win32/NT CGI, or WinCGI binaries. If so, put the scripts
in your Win32/NT CGI binaries directory, for example, the CGI32
directory. If your HTTP server does not support ALIAS, it must
be in your WWW data directory or one of its subdirectories.
Now would be a good time to install Blat from the CD-ROM, if you
have not done so already. (See Chapter 8,
"Serving E-mail via TCP/IP," for more information about
installing Blat.)
To install the WWW MailTo&CommentTo Gateway, you only need
to modify the configuration as described in the following section
titled "Configuring the Script." Beyond the simple configuration,
the main issue is how to call it properly. This depends on how
your HTTP server executes scripts.
If your HTTP server can execute scripts directly (for example,
the Alibaba Web server), you can use HTML such as this:
<A HREF="http://rick.wzl.rwth-aachen.de/cgi32/mailto.pl">
If your HTTP server must execute a program binary (for example,
the EMWAC HTTPS), you can use HTML such as this:
<A HREF="http://rick.wzl.rwth-aachen.de:8001/cgi32/perl.exe?cgi32/mailto.pl">
If you are unfamiliar with any part of the syntax of the above
URL, please refer to Chapter 5, "What
You Need to Know About HTML," for a refresher course. The
question mark character is a special CGI marker indicating the
start of the command-line arguments to be passed in the QUERY_STRING
variable. (See Chapter 19.)
Alternatively, you can use Rick's CGI2Shell Gateway. In this case,
you could do the following:
<A HREF="http://rick.wzl.rwth-aachen.de:8001/cgi32/cgi2perl.exe/
cgi32/mailto.pl?">
The last way is much easier if you want to specify parameters.
See the following "Usage" section for more information
about parameters.
Usage
First of all, you must create an HTML tag for WWW MailTo&CommentTo
Gateway in your HTML document, which calls the script by the GET
method. When called by the GET
method, the script displays a standard e-mail form. Here is one
example of the HTML code:
<A HREF="http://rick.wzl.rwth-aachen.de/cgi32/mailto.pl">Mailto</A>
<A HREF="/cgi32/mailto.pl">Mailto</A>
You can also include command-line parameters in the HTML tag where
parameter is source, or one
or more pairs of variables and values each separated by one ampersand.
The variable and its value are separated by =.
Note that all parameters must be HTML-encoded. That means that
all spaces are replaced with plus signs (+).
Also note that plus signs must then be specified in hexadecimal
with %2B. Other HTML-reserved characters must also be encoded
similarly.
The source parameter returns
the script source code if source viewing is enabled and source
is the only parameter. The pairs of variables and values could
be all reserved variables except from
and HTTPpage. These variables
can be supplied in the GET
request when linking to the mailto script. If you simply want
your mail address to be given in the mail form as the default
value, make your HTML look something like this:
<A HREF="/cgi32/mailto.pl?to=rickG@pobox.com">
If you want your default subject to be "This is a subject!",
give the subject variable separated by an ampersand. For example:
<A HREF="/cgi32/mailto.pl?to=rickG@pobox.com&subject=This+is+a+subject!">
Notice that the subject must be URL-encoded.
Reserved Variables
Thereare several reserved variables that the script will check
for explicitly.
- to-Defines a default
mail address to send mail to. If mail is restricted to predefined
e-mail addresses by the variable %defto
(see the section on "Setting Default Values") and this
address is allowed, it will be the default address.
- cc-Defines the carbon
copy mail addresses (separated by commas). If mail is restricted
to predefined e-mail addresses by the variable %defto
(see the section on "Setting Default Values"), it will
be disabled.
- From-Defines the mail
address of the sender. If no value is given, the script tries
to determine the address by looking for the CGI environment variables
REMOTE_USER, HTTP_FROM,
and REMOTE_HOST.
- Copyself-If the Copy
to Self check box in the form is checked, the sender's e-mail
address (from variable) will
be added to cc.
- Subject-Defines the subject
for the mail.
- Body-Defines the body
text of the mail. There can be more than one body variable; they
will be concatenated.
- Followup-Defines a followup
URL to retrieve after mail is sent. If this variable is undefined,
a default confirmation and "thank you" message will
be sent to the client.
- Creator-Defines a default
creator message. If this variable is set, the message given by
"creator" will be added at the end of each mail.
- formfile-Defines the
filename of a user-specific form file that will be used instead
of the standard form. Because of security, the path and extension
of the form file are defined.
- HTTPpage-Source URL,
from which the mailto script is started. It will be automatically
detected if your HTTP server supports the HTTP_REFERER
CGI variable.
All of these variables (except from
and HTTPpage) could be set
to default values, which can protect against overwriting. All
of these variables can also be set at the command line following
the "?" (which will then be inserted into the CGI environment
variable QUERY_STRING).
These reserved variables have a special meaning for the script
and must be set by either the Webmaster or the user. With the
exception of the to and from
variables, all variables are set to default values if they are
undefined.
For easy questionnaires, all other CGI variables will be logged
after the body portion-regardless of whether the values are hidden
or part of the fill-out form. Remember that the GET
method is limited on the number of characters passed. The variable
and its value are separated by =,
different variables/values by &.
Spaces are replaced with +;
plus signs and other HTML-reserved characters must then be specified
in hexadecimal with %2B. Every non-reserved CGI variable will
be logged after the mail body in variable/value pairs. To use
the user-defined variables, you need to first create a user-defined
form.
Configuring the Script
Before starting to use the script, you must configure it. All
configurable variables are in the first section of the script,
as follows:
- $mailprogram-Filename
of mail utility. If it is not in the path, you must specify the
whole path.
- $mailfile-Location (including
filename) of temporary mail file. If this variable is set, mail
will be sent by SMTP.
- $commentfile-Location
(including filename) of comment file. If this variable is set,
mail will be written to the local comment file.
- $saveenv-Save environment
setting. If this variable is set to 1, the environment variables
will be saved to the comment file. If this variable is set to
2, the environment variables will be saved to the comment file
and appended to the end of the mail file.
- $timezone-Set this to
your local time zone (for example "MET" or "GMT+1").
- $logfile-Location (including
filename) of application log file. If this variable is not set,
no logging is done.
- $errorfile-Location (including
filename) of error file. If this variable is not set, no error
logging is done.
- $source-Location (including
filename) of script file. If this variable is not set, source
viewing is not possible.
- $formdir-Directory for
form files. It must end with "/" or "\\".
- $formext-Extension for
form file.
- $HTTPsource-Location
of script file in HTTP format. The variable will be detected automatically
if it is not set and your Web server supports direct script execution.
- $libfiles-Location of
Perl libraries.
- $ContactHTMLtag-Contact
URL for server/script problems in HTML format.
- $HTMLBackground-User-specific
backgrounds for all returns. Here, you can set all elements of
the <BODY> tag, which
can appear between <BODY ... >
in HTML format.
- @HTMLHeader-User-specific
header for all returned HTML pages in HTML format.
- @HTMLAddress-User-specific
footer for all returned HTML pages in HTML format.
Setting Default Values
You can set default values to all reserved variables (except from
and HTTPpage) by configuring
the default values with the $def{}
variables in the script. All of these variables could also be
found in the first section of the script. If the variable $default
is set, these variables are fixed. They cannot be overwritten
by given parameters to the script tag in an HTML page or the user
input when filling out the form. If $default
is not set, these default variables are used only if the reserved
variables are not set by command-line parameters or user form
input. For example:
- $default-If this variable
is set, the default value (see $def{}
variables in the following section) will be used and cannot be
overwritten through script parameters or CGI input by GET
method proceeding. If this variable is not set, the default values
are used only if there are no other values.
- $def{'to'}-Default value
for variable to.
- $def{'cc'}-Default value
for variable cc.
- $def{'copyself'}-Default
value for variable copyself
(value 0 or 1).
- $def{'subject'}-Default
value for variable subject.
- $def{'body'}-Default
value for variable body.
- $def{'followup'}-Default
value for variable followup.
- $def{'form'}-Default
value for variable form.
This is a form filename without path and extension.
- $def{'creator'}-Default
value for variable creator.
Restricted Mail Addresses
You can restrict mail addresses to one address if you set the
def{'to'} variable to an
e-mail address and prevent overwriting of this value by setting
the $default.
You can also restrict the to
mail addresses to certain addresses by setting the %defto
variable array. This variable can be found in the first section
of the script. For this feature, you must run a separate copy
of the script because the standard form always includes a selection
list for the addresses.
User-Defined Forms
You can createyour own forms without modifying the script. You
must define form files, which are also small Perl scripts. You
can create two kinds of form files. The first will be executed
when the main script is executed with the GET
method. It must create the form. If the second form exists, it
will be executed when the main script is executed with the POST
method (after the user submitted the mail). It is intended for
preparing the mail. To use the form file feature, the first (GET)
form must exist. The second (POST)
is optional.
You can specify the name of the form with the predefined variable
$defto{form}=form name inside
the script or with the parameter form=form
name. Form name is the filename of the form without
the path and the file extension. The path, the GET,
and the form extension must be configured in the script. If they
are not configured, the forms will not be executed. This is for
security reasons. The form files will be executed with the eval
function of Perl. Therefore, use a separate path for the form
files. If you don't do this, other files could also be executed!
Inside your form files, you can use all the variables and subroutines
of the main Perl script. You can overwrite variables from the
main script, for example $commentfile.
You can even write your own mailto application.
CGI Form Handling in Perl
As mentioned before, another excellent use for Perl is writing
code to manage the Common Gateway Interface (CGI) forms, which
have become the mainstay of the World Wide Web for interactive
communication.
cgi-lib.pl is a simple Perl
library designed to make writing CGI scripts in Perl easy. Many
Perl CGI scripts that you find on the Web use cgi-lib.pl.
See Listing 29.2 for an example.
Listing 29.2. A minimal Perl application using cgi-lib.pl.
#!/usr/local/bin/perl
# minimal.cgi
# Copyright (C) 1995 Steven E. Brenner
# $Header: /cys/people/brenner/http/docs/web/RCS/minimal.cgi,v 1.2
#1995/04/07 21:36:29 brenner Exp $
# This is the minimalist script to demonstrate the use of
# the cgi-lib.pl library -- it needs only 7 lines
# --
# This is NOT intended to be a "typical" script
# Most importantly, the <form> key should normally have parameters
#like
# <form method=POST action="minimal.cgi">
require "cgi-lib.pl";
if (&ReadParse(*input)) {
print &PrintHeader, &PrintVariables(%input);
} else {
print &PrintHeader,'<form><input type="submit">Data: <input name=
"myfield">';
}
Perl 5
Perl 5 adds many features to the language that space precludes
full coverage of in this short introduction. Some of the more
noteworthy enhancements are references, object-oriented extensions,
general cleanup, support for modules, and importing.
Like any programming language, Perl will take some time to master.
Alas, this is not a subject I can completely cover in this book.
However, I can give you some information about where to look.
This information will also tell you how you can quickly use existing
Perl applications. The first thing you might want to do is check
out these three text files that come with Perl:
- relnotes.txt-For general
information about Perl for NT and how Perl for NT differs from
Perl for UNIX.
- status.txt-For information
on what features are supported.
- registry.txt-For information
on using the registry access features.
To learn more about Perl, try the University of Florida's Perl
Archive at http://www.cis.ufl.edu/perl/.
Users in the UK might like to try something closer to home, such
as the NEXOR Ltd Perl Page at http://pubweb.nexor.co.uk/public/perl/perl.html.
Here are a few other Perl resources on the Net; the last one consists
of a few newsgroups dedicated to Perl topics.
http://www.metronet.com/perlinfo/perl5.html
http://www.perl.com/perl/faq/comp.lang.perl
It has been a very educational experience writing this book. I
hope it has been, and will continue to be, as useful for you as
it has been fun to write.
You have chosen a very exciting time to be involved with Windows
NT and Web technologies. I wish you continued success on your
Windows NT Intranet.

Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.